bump zarr conventions; migrate type checking to pyright#199
Open
d-v-b wants to merge 9 commits into
Open
Conversation
| # Check if this variable already exists and is valid | ||
| if not force_overwrite and store_exists: | ||
| if utils.validate_existing_band_data(existing_dataset, var, ds): | ||
| assert existing_dataset is not None # guaranteed by store_exists |
| for var in ds.data_vars: | ||
| if hasattr(ds[var].data, "chunks"): | ||
| current_chunks = ds[var].chunks | ||
| assert current_chunks is not None # guaranteed by hasattr(..., "chunks") |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Route multiscales, spatial, and proj convention metadata through a single utils.build_convention_attrs() helper that delegates to zarr_cm.create_many, so zarr-cm validates each convention and emits its CMO. Previously the CMOs were hand-placed and the spatial:/proj:/multiscales keys assembled by hand, which skipped zarr-cm validation (e.g. multiscales' layout>=1 and derived_from=>transform rules) and duplicated key strings. Type the helper precisely with zarr-cm's TypedDicts (SpatialAttrs, GeoProjAttrs, MultiscalesAttrs, MultiConventionAttrs) and a CRSLike Protocol instead of dict[str, Any]. Multiscales data is still produced by the project's MultiscaleMeta model (it also covers the TMS encoding zarr-cm doesn't model), but its CMO and validation now go through zarr-cm. Output is byte-identical to before (verified against the golden-file snapshot tests). Add tests/test_conversion/test_convention_attrs.py covering the helper, including that zarr-cm now rejects an invalid multiscales layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ignore root-level scratch files (test.py, tmp.json, cli_test.sh) and the machine-local .claude/ session directory so they stop showing in git status. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This reverts commit 4a5b6e4.
zarr-cm's convention TypedDicts now use PEP 728 `extra_items`, which mypy does not support. Switch the type checker to pyright (which does) and upgrade to the zarr-cm main git dep that also ships the supporting fixes: - Mapping-covariant aggregate API + exported JsonType/JsonValue/JsonDict, so build_convention_attrs no longer needs a private `_core` import or a cast. - PEP 695 `type JsonValue` alias, so pydantic resolves MultiscaleGroupAttrs' ConventionMetadataObject field directly (removed the model_rebuild workaround). Toolchain: - pyproject: replace [tool.mypy] with [tool.pyright]; mypy -> pyright dev dep. - .pre-commit-config / CI lint job: run `uv run --frozen pyright`. Type fixes across src/ and tests/ to reach a clean pyright run (0 errors): - remove stale mypy `# type: ignore[...]` comments (pyright-unnecessary); - narrow NotRequired TypedDict access (`.get()` + guard) instead of making keys Required, preserving runtime validation of optional Sentinel-1/2 members; - replace casts on external/untyped data with runtime isinstance/validation; - model construction via `Model.model_validate(...)` instead of `**dict`. No `typing.Any` introduced. Runtime behavior unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| ) | ||
|
|
||
| # calculate_default_transform sizes the grid, so width and height are populated | ||
| assert width is not None |
|
|
||
| # calculate_default_transform sizes the grid, so width and height are populated | ||
| assert width is not None | ||
| assert height is not None |
| CONVENTION_NAME = multiscales_cm.CMO["name"] | ||
| CONVENTION_DESCRIPTION = multiscales_cm.CMO["description"] | ||
| _CONVENTION_NAME = multiscales_cm.CMO.get("name") | ||
| assert _CONVENTION_NAME is not None |
| assert _CONVENTION_NAME is not None | ||
| CONVENTION_NAME = _CONVENTION_NAME | ||
| _CONVENTION_DESCRIPTION = multiscales_cm.CMO.get("description") | ||
| assert _CONVENTION_DESCRIPTION is not None |
| expected_uuid = multiscales_cm.CMO["uuid"] | ||
| if not any(c["uuid"] == expected_uuid for c in value): | ||
| expected_uuid = multiscales_cm.CMO.get("uuid") | ||
| assert expected_uuid is not None |
| scale_level_data["transform"] = multiscale_transform | ||
|
|
||
| # Add spatial properties | ||
| assert "spatial_shape" in overview_level # always populated by the producer above |
Three casts asserted a type derived from Any or a union without verifying it. Replace each with an isinstance guard that raises TypeError on violation, so the assumption is enforced at runtime instead of only asserted to the checker: - sentinel1_reprojection: rio.write_crs() returns Any -> verify xr.Dataset. - s2_multiscale: output_group[base_path] is Array | Group -> verify zarr.Group. - s2_multiscale: client.compute() returns Any -> verify distributed.Future. The remaining casts bridge types on data we already validated (model_dump / create_many output), satisfy protocol/TypeVar binding on `self`, or widen a TypedDict for a third-party API — a runtime check there adds no safety. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Author
|
@lhoupert i forgot to tag you for review when I opened this 🤦 lmk if you want to review it, otherwise i am inclined to merge for speed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates the data model to the latest GeoZarr-relevant Zarr conventions (via the
zarr-cmpackage) and overhauls the project's type-checking setup.The work landed in two parts:
1. zarr-cm upgrade
zarr_cm.create_manyvia a singleutils.build_convention_attrshelper, so zarr-cm validates each convention and emits its CMO — instead of hand-assemblingzarr_conventionsand thespatial:/proj:keys.zarr-cm(currently themaingit dependency, which ships: aMapping-covariant aggregate API, publicJsonType/JsonValue/JsonDictexports, and a PEP 695type JsonValuealias). The PEP 695 alias lets pydantic resolveMultiscaleGroupAttrs'ConventionMetadataObjectfield directly — nomodel_rebuild()workaround.schema_url/spec_urlchanged with the new convention releases (URL-only diffs).2. mypy → pyright migration
zarr-cm's convention TypedDicts now use PEP 728
extra_items, which mypy does not support. The type checker is switched to pyright (which does):pyproject.toml:[tool.mypy]→[tool.pyright];mypy→pyrightdev dependency.uv run --frozen pyright.Reaching a clean pyright run (0 errors across
src/+tests/) involved:# type: ignore[...]comments;NotRequiredTypedDict access with.get()+ guards instead of making keys Required (preserving runtime validation of genuinely-optional Sentinel-1/2 members);casts on external/untyped data with real runtime checks (CRS.from_user_input, a_as_bboxvalidator, defensiveget_zarr_groupresolution);Model.model_validate(...)rather than**dict.No
typing.Anyis used in the changed code. Runtime behavior is preserved.Verification
pyright: 0 errors (src+tests).ruff check+ruff format: clean.Upstream
Filed (and merged) several
zarr-cmimprovements surfaced by this work: aggregate-APIMapping/type exports and the PEP 695JsonValuealias.🤖 Generated with Claude Code